This notebook presents the plot scripts and statistical analyses presented in the paper “Molecular determinants of dexamethasone vascular transport in COVID-19 therapy” by Shabalin et al. Plots presented herein are interactive versions of those presented in the manuscript.

1 Data characteristics

The data for the study was taken from the supplementary materials of “An interpretable mortality prediction model for COVID-19 patients” Nat. Mach. Intell. 2, 283–288 (2020) by Yan et al. and describes COVID-19 patients admitted to Tongji Hospital, Wuhan, China between January 10 and February 18, 2020. The raw dataset contained information about 375 patients, however only 373 patients that had their albumin levels measured at least once during their hospital stay, and those were of interest to the current study. The table below presents the basic statistics of clinical variables analyzed in this study. These and other statistics, unless stated otherwise, are calculated based on the last available blood sample of given patient.

No Variable Stats / Values Freqs (% of Valid) Graph Missing
1 Outcome
[factor]
1. Died
2. Survived
174 (46.7%)
199 (53.3%)
0
(0%)
2 Gender
[factor]
1. Male
2. Female
222 (59.5%)
151 (40.5%)
0
(0%)
3 Age
[numeric]
Mean (sd) : 58.8 (16.5)
min < med < max:
18 < 62 < 95
IQR (CV) : 24 (0.3)
71 distinct values 0
(0%)
4 TimeSinceAdmission
[numeric]
Mean (sd) : 7.7 (5.9)
min < med < max:
0 < 7.3 < 21.8
IQR (CV) : 10.5 (0.8)
342 distinct values 0
(0%)
5 Albumin
[numeric]
Mean (sd) : 32.6 (6.3)
min < med < max:
13.6 < 33 < 47.6
IQR (CV) : 9.7 (0.2)
183 distinct values 0
(0%)
6 Glucose
[numeric]
Mean (sd) : 8.5 (5.2)
min < med < max:
1 < 6.5 < 38.8
IQR (CV) : 4.7 (0.6)
284 distinct values 0
(0%)

The overall mortality rate was 46.65%, whereas the mortality rate among male and female patients was 56.76% and 31.79%, respectively.

2 Albumin levels

2.1 Distributions and normality tests

The sina plot below presents the distribution of albumin levels among the two outcome groups of patients (Died, Survived), with the mean and standard deviation overlaid. The horizontal dashed lines represent the normal range for albumin (35-55 g/L).

To verify whether the albumin levels in both outcome groups are normally distributed, we first plotted Q-Q plots, as presented below. The shaded area represents the 95% confidence intervals.

Upon a positive verification of the Q-Q plots, the normality of the the albumin distributions was additionally confirmed by two Shapiro-Wilk tests at \(\alpha = 0.05\). The null hypothesis for the test is that the distribution is normal. Therefore, if we cannot reject the null hypotheses, we will assume the the distributions are normal.

Shapiro-Wilk test: Died

value
statistic.W 0.995438624983042
p.value 0.876037297780011
method Shapiro-Wilk normality test
data.name Outcome == Died

Shapiro-Wilk test: Survived

value
statistic.W 0.991932747653017
p.value 0.338673164623577
method Shapiro-Wilk normality test
data.name Outcome == Survived

Since the p-values of the Shapiro-Wilk tests are above 0.05, we cannot reject the null hypothesis that the albumin levels are normally distributed. Therefore, we assume the null hypothesis is true and the albumin levels for both outcome groups are normally distributed. Having verified the normality of the distributions, we can perform a two-tailed Welch’s t-test to check if the differences in albumin levels are statistically significant.

value
statistic.t -19.1881248432455
parameter.df 338.060954028682
p.value 4.94695446316735e-56
conf.int1 -9.91937925851427
conf.int2 -8.07476964693251
estimate.mean of x 27.7752873563218
estimate.mean of y 36.7723618090452
null.value.difference in means 0
stderr 0.468887633691339
alternative two.sided
method Welch Two Sample t-test
data.name Outcome == Died vs Outcome == Survived

With a p-value < 0.001 we can reject the null hypothesis that the means are equal, and state that the differences in mean albumin levels between the patients that died and survived are statistically significant. The mean albumin level for those that sirvived was 36.7723618 g/L, whereas for those that died it was 27.7752874 g/L.

2.2 Gender

The difference between albumin levels was also evaluated for gender. It can be noticed that the albumin distribution shapes and means are practically identical to those of the overall patient cohort. Nevertheless, the proportions of points within those that died and survived for both genders are different. This relates to the earlier mentioned differences in mortality rates: 56.76% and 31.79% for males and females respectively)

The differences in mean albumin levels within patients of the same gender were statistically significant according to a two-tailed Welch’s t-test at \(\alpha = 0.05\) (p < 0.001).

Shapiro-Wilk test: Male Died

value
statistic.W 0.994896232157248
p.value 0.933768572264352
method Shapiro-Wilk normality test
data.name Male: Outcome == Died

Shapiro-Wilk test: Male Survived

value
statistic.W 0.982301118999149
p.value 0.222347078792031
method Shapiro-Wilk normality test
data.name Male: Outcome == Survived

Welch’s t-test: Male

value
statistic.t -13.7391493435378
parameter.df 212.274917998268
p.value 3.75705427781707e-31
conf.int1 -9.75071978098348
conf.int2 -7.30384371108002
estimate.mean of x 28.0039682539683
estimate.mean of y 36.53125
null.value.difference in means 0
stderr 0.620655728590837
alternative two.sided
method Welch Two Sample t-test
data.name Male: Outcome == Died vs Outcome == Survived

Shapiro-Wilk test: Female Died

value
statistic.W 0.970911382714337
p.value 0.274776893133905
method Shapiro-Wilk normality test
data.name Female: Outcome == Died

Shapiro-Wilk test: Female Survived

value
statistic.W 0.983523008743715
p.value 0.230027042682121
method Shapiro-Wilk normality test
data.name Female: Outcome == Survived

Welch’s t-test: Female

value
statistic.t -11.9560304977694
parameter.df 71.3045086774262
p.value 1.05671199130413e-18
conf.int1 -11.460025381254
conf.int2 -8.1841493760276
estimate.mean of x 27.175
estimate.mean of y 36.9970873786408
null.value.difference in means 0
stderr 0.821517424238191
alternative two.sided
method Welch Two Sample t-test
data.name Female: Outcome == Died vs Outcome == Survived

2.4 Albumin vs glucose levels

Another variable that was taken into account was the pateint’s glucose level. The scatter plot below presents the relation between albumin levels (y-axis) and glucose levels (x-axis), with color denoting the outcome groups of patients (red: Died, blue: Survived).The horizontal dashed lines represent the normal range for albumin (35-55 g/L), whereas the vertical dashed lines show the glucose fasting normal range (4.0-5.5 mmol/L) and the random plasma test diabetes threshold (11.1 mmol/L).

It can be noticed that both albumin and glucose levels can be associated with COVID-19 outcome. The relation will be analyzed as part of the logistic regression analysis.

2.5 Albumin vs age

Similarly to glucose levels, the scatter plot below presents the relation between albumin levels (y-axis) and age (x-axis), with color denoting the outcome groups of patients (red: Died, blue: Survived).

Once again, it can be noticed that age can be associated with albumin levels and COVID-19 outcome. This relation will also be analyzed as part of the logistic regression analysis.

3 Regression analysis

To verify the statistical significance of associations between different variables analyzed in this study (albumin levels, glucose levels, gender, age) and the outcome (Died, Survived). We will first created unadjusted models to see the relations between each single variable and the outcome. After that we will create and adjusted model where all the variables are taken into account as potential confounders. Finally, we will verify whether apart from confounding there is any effect modification between albumin levels and other variables.

3.1 Unadjusted models

Below are the results of logistic regression for each variable. All the unadjusted models were found to be statistically significant at \(\alpha = 0.05\), with p < 0.001.

Albumin

  OutcomeNumber
Predictors Odds Ratios CI p
(Intercept) 0.00 *** 0.00 – 0.00 <0.001
Albumin 1.56 *** 1.44 – 1.71 <0.001
Observations 373
R2 Tjur 0.551
  • p<0.05   ** p<0.01   *** p<0.001

Gender

  OutcomeNumber
Predictors Odds Ratios CI p
(Intercept) 0.76 * 0.58 – 0.99 0.045
Gender [Female] 2.82 *** 1.83 – 4.37 <0.001
Observations 373
R2 Tjur 0.060
  • p<0.05   ** p<0.01   *** p<0.001

Age

  OutcomeNumber
Predictors Odds Ratios CI p
(Intercept) 479.80 *** 133.55 – 2023.14 <0.001
Age 0.90 *** 0.88 – 0.92 <0.001
Observations 373
R2 Tjur 0.332
  • p<0.05   ** p<0.01   *** p<0.001

Glucose

  OutcomeNumber
Predictors Odds Ratios CI p
(Intercept) 10.30 *** 5.76 – 19.34 <0.001
Glucose 0.76 *** 0.70 – 0.81 <0.001
Observations 373
R2 Tjur 0.227
  • p<0.05   ** p<0.01   *** p<0.001

3.2 Adjusted model

The adjusted model has shown that albumin levels are statistically significantly associated with COVID-19 outcome (p < 0.001), even when confounding factors are taken into account.

  OutcomeNumber
Predictors Odds Ratios CI p
(Intercept) 0.00 *** 0.00 – 0.02 <0.001
Albumin 1.51 *** 1.37 – 1.69 <0.001
Glucose 0.89 ** 0.81 – 0.96 0.006
Age 0.92 *** 0.89 – 0.94 <0.001
Gender [Female] 2.19 * 1.04 – 4.72 0.041
Observations 373
R2 Tjur 0.676
  • p<0.05   ** p<0.01   *** p<0.001

3.3 Testing for effect modification

Test for interactions between albumin levels and other variables, did not show any significant effect modification; all p > 0.20.

Albumin and gender

  OutcomeNumber
Predictors Odds Ratios CI p
(Intercept) 0.00 *** 0.00 – 0.00 <0.001
Albumin 1.48 *** 1.35 – 1.66 <0.001
Gender [Female] 0.03 0.00 – 13.43 0.285
Albumin * Gender [Female] 1.15 0.95 – 1.42 0.173
Observations 373
R2 Tjur 0.570
  • p<0.05   ** p<0.01   *** p<0.001

Albumin and age

  OutcomeNumber
Predictors Odds Ratios CI p
(Intercept) 0.00 0.00 – 71.37 0.173
Albumin 1.68 * 1.03 – 2.89 0.049
Age 0.95 0.74 – 1.24 0.722
Albumin * Age 1.00 0.99 – 1.01 0.772
Observations 373
R2 Tjur 0.658
  • p<0.05   ** p<0.01   *** p<0.001

Albumin and glucose

  OutcomeNumber
Predictors Odds Ratios CI p
(Intercept) 0.00 *** 0.00 – 0.00 <0.001
Albumin 1.73 *** 1.42 – 2.09 <0.001
Glucose 1.44 0.70 – 2.39 0.242
Albumin * Glucose 0.98 0.97 – 1.01 0.096
Observations 373
R2 Tjur 0.590
  • p<0.05   ** p<0.01   *** p<0.001

3.4 Regression based on gender

Finally, for the only categorical confounding variable we have plotted (gender) a logistic regression plot with separate lines for males and females. It can be noted that gender makes a different only for high albumin levels. In other words, low albumin levels are an equally strong predictor of death from COVID-19.

4 Citing

If you find this analysis useful, please cite: Ivan G. Shabalin1, Mateusz P. Czub, Karolina A. Majorek, Dariusz Brzezinski, Marek Grabowski, David R. Cooper, Mateusz Panasiuk, Maksymilian Chruszcz, Wladek Minor, “Molecular determinants of dexamethasone vascular transport in COVID-19 therapy”, in review.